Talk by Yan Karklin, Center for Neural Science, NYU. Given to the Redwood Center for Theoretical Neuroscience at UC Berkeley.
Abstract. Efficient coding provides a powerful principle for explaining early sensory processing. Among the successful applications of this theory are models that provide functional explanations for neural responses in the primary visual cortex (Bell & Sejnowski, 1995; Olshausen & Field, 1996) and in the auditory nerve (Smith & Lewicki, 2006). Two of the challenges facing these models are mapping abstracted computations onto noisy, nonlinear, multistage neural implementations, and capturing the complexity of natural signals that necessitates such hierarchical neural processing. For example, statistical models of vision often suggest a direct transformation from the image to a set of oriented features, but this seems to bypass earlier stages -- the retinal output, organized into intricate mosaics of center-surround receptive fields -- and ignores their nonlinear response properties. Auditory models of efficient coding yield filters consistent with the cochlear output, but little work has been done on learning hierarchical representations that can explain downstream processing of complex sounds. In this talk I will describe two recent projects that address some of these issues. First, I will show that an efficient coding model that incorporates ingredients critical to biological computation -- input and output noise, nonlinear response functions, and metabolic constraints -- can predict the basic properties of retinal processing. Specifically, we develop numerical methods for simultaneously optimizing linear filters and response nonlinearities of a population of model neurons so as to maximize information transmission in the presence of noise and metabolic costs. When the model includes biologically realistic levels of noise, the predicted filters are center-surround and the nonlinearities are rectifying, consistent with properties of retinal ganglion cells. The model yields two populations of neurons, characterized by On- and Off-center responses, which independently tile the visual space, and even predicts an asymmetry observed in the primate retina: Off-center neurons are more numerous and have filters with smaller spatial extent. In the case of auditory coding, I will present Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fitted to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. (This is joint work with Chaitanya Ekanadham and Eero Simoncelli.)